General Disclaimer 


One or more of the Following Statements may affect this Document 


• This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 


• This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 


• This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 


• This document is paginated as submitted by the original source. 


• Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 


Produced by the NASA Center for Aerospace Information (CASI) 



t 


NASA TECHNICAL 
MEMORANDUM 

NASATMX-73347 

N77- 10b08 

Unclas 
07989 

INVENTORIES, ACCURACIES. AND MAPS 


(NAS A-TM-X-73347) EVALUATION CP7TENIA FOR 
SOFTWARE CLASSIFICATION INVENTORIES, 

ACCUPACIES, AND NAPS (NASA) 27 p HC A03/NF 
>-01 CSCL 05E 

G3/4 3 

EVALUATION CRITERIA FOR SOFTWARE CLASSIFICATION 


By Robert R. Jay roe, Jr. 
Data Systems Laboratory 


September 1976 


NASA 



Geom C. Marshall Space Flight Center 
Marshall Space Flight Center, Alabama 


MSFC - Form .U90 (Rev June 1971) 


1, report no. 

2. governml M Acensss 

NASA TM X- 73347 



a TITLE AND SUBTITLE 

Evaluation Criteria for Software Classificatio i Inventories, 
Accuracies, and Maps 


7. AUTHQR(S) 

Robert R. Jayroe» Jr. 


9. PERFORMING ORGANIZATION NAME AND ADDRESS 

George C. Marshall Space Flight Center 
Marshall Space Flight Center, Alabama 35812 


12 SPONSORING AGENCY NAME AND ADDRESS 

National Aeronautics and Space Administration 
Washington, D.C. 20546 


TCCNMICA!. REPCRT STANDARD TITLE PAGE 


At CLIENT'S catalog no. 


;; BL'-OPT DATE 

i September 1976 

ries, pTa^MIhY DRG ANI ZATI ON CODE 

i 


ij, PEHil'ORMBNG ORGANIZATION REPHRr n 

1 


10. WORK UNIT NO. 

I 

JTl contract or grant no. 

f 

\i.-i . type of REPOR'i a period COVERED 

I Technical Memorandum 

) 

111 ^nVcrinl afency code 


1 15. supplementary notes 
1 

I This work was sponsored by the Office of Applications' Data Management Progxam. 
j Prepared by Data Systems Laboratory, Science and Engineering 


16, abstract 

The main tool for comparing remote sensing classification results witii ground truth 
information is a contingency table derived fixim overlaying digital classification and ground 
! truth maps. The purpose of tliis report is to explore methods of deriving a maxlmunT amount 
1 of information from the contingency table and of modifying tlie contingency table to provide 
j more information. This report contains 15 different statistical criteria derived from a 
! contingency table that can be used to evaluate tabular classification results, which 

unfortunately provide little information on the visual characteristics of a classification map. 
Tabular results provide information relating mainly to how much ratlier than where, which 
is the purpose of a map. Therefore modifications are proposed to the contingency table 
which contain information on tlie spatial complexity of the test site, on the relative location 
of classification errors, on how well tlic classification maps agree ndtli the ground truth 
maps, and which I’educe back to die original information normally contiunccl in a contingency 
table. 


17- KEY v;ORDS 

Classification 
Remote sensing 

Classification Technique Eviiluation 


!STa!piL,T!GN 5TAT^ 


Categories 43 and fil 


15 SECURITY CLASSIF. (nf ihl» rupurtl 

20 MiCURlTY CLASSir. | 

1 

.i. 

::r PAt^rs 

22 PrtfCE 

Unclassified 

Unclassified 


27 

NT3S 


C - Vrtrm 2 292 ;nl)L*r 1 97 2} 


J"<ir Jky NV/ljort il i»’'i :.J2 I 5 J 








ACKNOWLEDGMENT 


The author wishes to acknowledge Dr. R. Atkinson, Dr. H. Ramapriyan, 
Dr. B. Dasarthy, and Mr. M. Lybanon of the Computer Science Corporation, 
HuntsAulle, Alabama, for programming the software that r rovided the results 
for this report. 


TABLE OF CONTENTS 


Page 


1. INTRODUCTION 1 

II. MATHEa'iATICAL DESCRIPTION OF EVALUATION 

CRITERIA 3 

III. EVALUATION RESULTS 7 

IV. PROPOSED STATISTICAL TESTS FOR EVALUATING 

CM AND/OR GTM 14 


iii 


LIST OF TABLES 


Table Title Page 

1. General 5 by 5 Contingency Table 2 

2. Contingency Table for GTM Versus MLCM 8 

3. Contingency Table for GTM Versus LCM 9 

4. Contingency Table for GTM Versus MLMCM 9 

5. Contingency Table for GTM Versus LMCM 10 

6. Contingency Table for GTM Versus DSCM 10 

7. Statistic Versus Teclinique 11 

8. Technique Versus Chi-Squared Values Using Table 7 

Data 12 

9. Inventory Accuracy Versus Chi-Squared 12 

10. GTM/GTM Contingency Matrix for Each Feature 18 

11. GTM/GTM Contingency Matrix for All Features 19 

12. MLCM/GTM Contingency Matrix for Each Feature 20 

13. MLCM/GTM Contingency Matrix for AH Features 21 


TECHNICAL MEMORANDUM X-73347 


EVALUATION CRITERIA FOR SOFTWARE CLASSIFICATION 
INVENTORIES, ACCURACIES. AND MAPS 

I. INTRODUCTION 


Considerable emphasis is now being given to the evaluation of image 
classification and compression techniques. This report describes the evaluation 
criteria and procedures that have been proposed and develGpcd to focus attention 
on the existing state of the art and pi'ovide guidance for future researcli efforts. 
Although there are many criteria, e. g. costs, running times, co7nputer resources, 
etc. , that should be considered in evaluating techniques, the main empliasis of 
this rcpoi’t is concerned witli statistical )7crformancc. 

Assume that multispectral image data have been classil'ied using a 
particular technique to produce a classification map (CM) and tliat the CM has 
been overlayed with a digital version of a ground truth map (CTM) . The normal 
procedure is to produce a contingency table, such as shown in Table 1, and 
determine a percentage accuracy as a measure of the goodness of a classification 
technique. However, there w'ould appear to be considerable risk involved in 
judging the nierits of various classification techniques iiased upon this one 
niunber. Hence, one of the pur|X)ses of this reixji't is to mathematically explore 
the contingency table to dctenninc hovv mucii additional ininianation can Ijc 
extracted. However, it must also be kept in mind that the table only provides 
numerical results and contains relatively little information concerning the map 
producing abilities of the various classification techniques. The desired end 
result is that there will be a sufficient numljer of mathematical criteria that 
can be examined to ensure as much completeness in the evaluation as possible. 
Criteria and procedures similar to what is discussed in the report can also be 
adapted to evaluate compression and change detection analysis results. 

The contingency tables used in this report resulted from a cooperative 
evaluation of classification techniques which involved Marshall Space Flight 
Centei', Hunts\dlle, Alabama, and the Tennessee State Planning OfLce, Nash- 
ville, Tennessee. Landsat data from the Bald Knob, Tennessee, Quadrangle 
were used as a test site and four sets of seasonal data w'ere also included for 
multitemporal evaluation. All of the techniques discussed in lliis repoit are 


supervised techniques and all used the same training areas for the classification 
results. The five classification results that are discussed include tlie Gaussian 
Maximum Likelihood which was used on one season of data as well as all four 
seasons simultaneously, the Linear Classifier Model which was also used on 
one season as well as all four seasons, and the Density Slicing Classifier which 
was used on only one season of data. The Linear Classifier uses hyperialanes 
to separate feature categories, while the Density Slicing Method selects a 
channel of data as well as a class interval in that channel to separate feature 
categories. 

Section II describes contingency tables and tests derived from the tables 
in a general manner, and Section III describes die evaluation of the classification 
analysis results. Section IV describes a proposed approach for evaluating 
classification maps that reduces back to the normally used contingency table. 


TABLE 1. GENERAL 5 BY 5 CONTINGENCY TABLE 


CM 

GTM 

1 

2 

3 

4 

5 


fl 

n^ll 

^11 

^^12 

^3 

^14 

^15 


e^l 

2 

n^22 

^21 

^22 

^23 

*^24 

^5 

"2 

e^2 

3 

n^33 

^31 

^^32 

^33 

^34 

”35 

<^3 

e^3 

4 

n^44 


\2 

’^43 

\4 

”45 

®4 

e^4 

5 



n7T55 

^51 

^2 

^3 

*^54 

”55 

e_ 

5 




°2 

°3 

°4 

0^ 

0 


%c 

0^1 

0^2 

0^3 

0^4 

0^5 


N 

c 


2 

































































II. MATHEMATICAL DESCRIPTION OF EVALUATION CRITERIA 


Table 1 shows the general form of a 5 by 5 contingency table tliat is con- 
sistent in size witli the tables used in Section III. The table indicates that there 
are five categories on the GTM being compared with five categories on the CM. 
The elements n. . tell how many pixels in class j on the CM occur at the 

same locations as pixels in class i on the CTM. The symbol is the number 

of pixels belonging to category i on the GTM and is the number that is expected 
to be obtained from the classification results. The symbol o^ is tlie munber of 

pixels that were classified in category j on the CM or the number that is 
observed, which is usually diffei’Cnt from what is expected. Matliematically 
speaking 


v» 

h n. . 

J 


and 


o. = Sn. . 

J T 


( 1 ) 


The symbols tt are probabilities of occim’ences, 


7T. = 

e 1 




TT. 

O i 



and TT. . 

n 11 


n../N 
11 T 


( 2 ) 


where N is the total number of pixels. The symbois N , %o, and % I are 
X c 

the number of correctly classified pixels, the classification accuracy, and 

inventory accuracy, respectively. Tlicse are computed using the following 

relations: 


N = )!n.. , %c = lOOfN /N.„) - and Tl = 100 
c V 11 ' c T' 

1 


For the inventory accuracy, the number wrong is given by tlie summation 
of the absolute value differences, which lias to be divided by tv.'o. Tlie factor of 
two is necessary because if one pixel changes category two columns are affected 
on the contingency table and the pixels are in effect counted twice. Tne inventory 
accuracy can also be computed by choosing the smaller of e. or o., summing 

over the categories, and multlplj.'ing by 100/N^ which gives the same result. 



3 


Two other tables can be generated from the actual contingency table; 
however, it is not necessary to do so because the actual table already contains 
the information. These two tables will be discussed to illustrate the concepts of 
randomness and optimumness. 


The concept of randomness is illustrated using the maximum likelihood 
estimators. The likelihood of an observed sample of being picked from an 

assumed population, i. e. , e^ and o^ are given and remain constant under all 

conditions, is tantamount to replacing n. . with e.o,/N or rr.e. in the contin- 

i.j 1 j T 0 j 1 

gency table. The only other quantities that change in the table are N , the 

c 

number of correctly classified pixels, and %c. This result should hold true for 
any sample of size picked from an assumed population and should be a 

random or "worst case" classification accuracy that is expected. 


The optimum case classification accuracy that can be expected for a 
given inventory (e^ and o^ given) occurs when the classification accuracy 

equals the inventory accuracy. This is tantamount to replacing n . with the 

smaller of or on the diagonal, and the remaining n. .(i^tj) will either 

be zero or indetenninant. The only othei- quantities that are changed are again 

N and % c. 
c 

There are several statistical performance criteria that can now be calcu- 
lated from the contingency table and these are discussed as follows: 


1. The first criteria is the actual classification accuracy. The classi- 
fication accuracies for tlie random and optimum cases pro\ide upper 
and lower limits fur the accuracy range, and a percent of optimum 
accuracy can be computed for a teclinique as a measure of how well 
it performed versus how well it could have performed. 


The remaining criteria are concerned with chi-squared tests tliat are convenient 
to use because the table contains information related to what is expected and 
what is observed. The chi-squared tests and formulas for computing the chi- 
squared values relating to those tests are as follows: 


2 . 


Hyi^othesis: The distribution (o.) of the classification inventory 
agrees with the distribution (e.) of the ground tiutli inventory: 


4 


Xn = 


Z 

J 


(o. *- e.) 
1 j L 


“ ?°h-'''T = if 


V 2 , 2 

X o. / 7T. - N_ 

Y ] e J T 


(4) 


3. Hypothesis: The clisti’ibution (n. .} of the correctly classified pixels 

1) ^ 

agrees wdth the distribution of tlie groimd truth inventory: 


^2 N 


V 2 / 

/, n. ./ 7T. - N 

V i,x e 1 c 


(5) 


4. I-IyixotliGsis: The disti'ibution of tlic number of correctly and incor- 
rectly classified pixels is optimum with respect to the given inven- 
tory and without regard to class; 



These three chi-squared values should be as small as possible to satisfy the 
hypotheses, wliile the remaining chi-squared values to be discussed should be 
as large as possible so that the hypotheses will bo rejected. 


5. Hy^xothesis: The correctly classified pixels are randomly disti'ilxuted: 


A c 


Z 


2 

n. 


.N 


i,iR c ,.2 


7T. C. Ac 

Oil 


where . N and arc the number of corr 

Ac Re 

the actual and random case. 


(7) 

classified pixels for 


5 


6. Hypothesis: Each classification feature is randomly distributed 
among the ground truth features according to the classification 
inventory: 


, 1. y -hi 

e. V 7T. i 


J 0 J 


where i refers to the feature on the GTM and j z'efers to the 
feature on the CM. 

7. Hypothesis: Each ground truth feature is randomly distributed 
among the classification features according to the ground truth 
inventory: 


L y JLii 

' V 7T. 

J 1 e 1 


where i and j have the same meaning as in equation ( 8) . 

8. Hypothesis: The number of correctly and incorrectly classified 
pixels are randomly distributed without regard to class: 


^7 


J) /n. . - TT. p-A Z f 1 

^ y 1,1 o 1 ij ^ y 1,1 o 1 ij 

‘Yj IT. e. W - y. 7T. e, 

Voii Voii 


9. Hypothesis: The number of correctly and incorrectly classified 
pixels for a particular class are randomly distributed: 


I 

n. . - 7T. e. 

L ].] 9 J ]J 


■IT. e. 
o 3 3 


n. . - IT. e. 

- 3. 3 0 3 i J- 

e. -- TT. e. 

3 0 3 3 


where j represents the class. 


I 


10 . 


Hypothesis: The distribution ol' the classified pixels is independent 
of the ground ti’gth: 


X 


2 

9 




o.e. 
1 J 


(12) 


11. The final criterion is the coefficient of contingency, which is similar 

2 

to a corx’elation coefficient and is calculated from x • The coeffi- 
cient is given by 


C 


2 

^8 


11/2 


“ 1 ) 


(13) 


where k is either the number of features on the GTM or CM, 
whichever is smaller. 

For relatively comparing various classification techniques, the best 
values observed for all tlie chi-squared tests can be chosen as the expected chi- 
squared values. Tlie actual observed chi-squared values for a particular tech- 
nique can then be measured against what is expected by cojnputing chi -squared 
values. The use of these criteria is illustrated in the next section. 


111. EVALUATION RESULTS 


Tables 2 througii G are the contingency tables for the various techniques 
being examined, and u, t, a, f, and w are the feature categories urban, trans- 
portation, agriculture, forest, and water, respectively. The techniques are 
identified by the labels: 

MLCM — Maximum Likeiiliood Classifier (Map) 

LCM — Linear Classifier (Map) 

MLMCM — Maximum Liicelihood Multitemporal Classifier (Map) 

LMCM — Linear Multitemporal Classifier (Map) 

DSCM — Density Slicing Classifier (Map) 


All of the classification programs are supervised techniques, and all 
programs were supplied the same training areas. The multitemporal programs 
used 16 channels of seasonal data rather than one season containing only 4 
channels. Thus, all of the results have one season of data in common. 

TABLE 2. CONTINGENCY TABLE FOR GTM VERSUS MLCM 


'"-■-^MLCM 

GTM 

u 

t 

a 

f 

w 


u 

.2138 

59 

47 

35 

134 

1 

276 

.0083 

t 

.1931 

129 

163 


403 

7 

844 

.0253 

a 

.6165 

2325 

751 

5904 

C82 

14 

9576 

.2868 

f 



2404 

745 

17488 



238 

22011 

.6593 

w 

.4639 

40 

95 

11 


315 

679 

.0203 


3689 

3460 

6837 

18825 

575 

33386 

71. 67 

.1105 

.1036 

.2048 

.5639 

.0172 

81.94 

23928 


Table 7 lists the statistical criteria as a function of classification tech- 
nique, and the numbers followed by an asterislc indicate the best numbers that 
were observed. The degrees of freedom (df) associated w'ith each chi-squared 
value is also listed in Table 7. Of Uie 26 possible best numbei's, MLCM has 6 
of them, LCM has 3, MLMCM has 11, LMCM has 4, and DSCM has 2, By using 
the numbers followed by an asterisk as expected values, a chi-squax*ed value can 
be computed for each technique that has n-1 or 25 df. These chi-squared 
values are listed in Table B. 

Tables 2 through 8 represent a consioera.ble amount of information that 
needs an equal amount of discussion. First, for 1 df there is a 0. 05 probability 
of finding a chi-squared value larger than 3. 841 and a 0. 01 probability of finding 
a value larger than 6. 635. For 4 df Uie 0. 05 and 0. 01 chi-squared values are 
9. 488 and 13. 277; for 16 df die 0. 05 and O. 01 values are 26. 296 and 32. 0; and 
for 25 df the 0. 05 and 0.01 values are 37. 652 and 44. 314. Using these values 


8 


































































































































































TABLE 5. CONTINGENCY TABLE FOR GTM VERSUS LMCM 


LMCM 


GTM 


u 


.1775 

t 


.1765 

a 


.6638 

f 


.8454 

w 


.4212 


2761 

.0827 


35 

43 

148 

1 

276 

149 

160 

443 

3 

844 

839 

, 6357 

650 

16 

9576 

1267 

1044 

18609 

216 

22011 

72 

21 

266 

286 ' 

679 

2362 

7625 

20116 

522 

33386 

. 0707 

.2284 

.6025 

.0156 

88.01 


table 6. CONTINGENCY TABLE FOE GTM VERSUS DSCM 


'gtm^ 

DSCM 

u 

t 

a 

f 

u 

. 2464 

68 

51 

33 

124 

t 

.1955 

159 

165 

117 

401 

a 

.5473 

2767 

609 

5241 

944 

f 

.7497 

1658 

3085 

595 

16502 

w 

.3608 

43 

44 

12 

335 



4695 

3954 

5998 

18306 



.1406 

.1184 

.1797 

.5483 


0 276 . 0083 


844 .0253 


I 

15 j 9576 .2868 


22011 .6593 


679 . 0203 


433 33386 65.86 


,013 77.45 21988 

































































TABLE 8. TECHNIQUE VERSUS CHI-SQUARED VALUES 
USING TABLE 7 DATA 


Te Clinique 

MLCM 

LCM 

MLMCM 

LMCM 

DSCM 

Clii-squared 

48950 

5100 

1597 

6409 

207832 


and examining Tables 7 and 8 show tba}: every single hypothesis was rejected 
and hardly any of the chi-squared values are even close to these numbers. An 
attempt was made to understand why the chi-squared values are so large by 
using the inventory from MLCM and computing the chi-squared values as a 
function of inventory accuracy. Equation ( 3) shows that the proportion of 
wrongly classified pixels for each category j is given by 


e. - o. 
_J 1 


(14) 


If it assunied that these proportions remain constant for any inventory 
accuracy, then Table 9 shows the inventory and chi-sqaared which result from 
this assunrption. 

TABLE 9. INVENTORY ACCURACY VERSUS CHI -SQUARED 


Inventory Category, o. 


u 

t 

a 

f 

w 

%1 


2 

Optimum X ^ 

370 

910 

9500 

21923 

676 

99.5 

39.12 

0. 1187 

^r65 

989 

9424 

21835 

673 

99 

158, 21 

0.4997 

1221 

1568 

8817 

21129 

650 

95 

3953 

13.685 

2166 

2293 

8059 

20247 

622 

90 

15817 

58. 15 

3110 

3017 

7300 

19365 



35564 

139 

4055 

3742 

6542 

184S3 

564 

80 

63239 

263 


12 

































































Thus, it is not possible to accept the hypothesis that the distribution of the 
classification inventory is statistically signiflcant when compared with the ground 
truth inventory even though tlie inventory is 99.5 percent correct. If the optimum 
classification accuracy is considered, then tlie chi-squared value is almost sig- 
nificant at 95 pei’cent classification accuracy. Hence, it appears that the chi- 
squared tests are extremely strict, but because of this it also appears to be 
extremely good at relatively discriminating between the performance of various 
techniques. 

Tables 7 and 8 show that different conclusions would be obtained if the 
techniques were judged on classification accuracy only versus a set of criteria. 
Presumably, the set of criteria provides for better judgment because it offers 
a more complete description of performance. 

In Table 7, shows that MLMCM benefited the most from the use of 
multitemporal data even through the classification accuracv increased less than 
2 percent and the inventory accui'acy less than 1 percent. This indicates that 
the inventory distribution improved considerably, and the inventory has to be 
relied on when there are no gromid truth results. The inventory accuracy is 
usually higher than the classification accuracy because the misc] assified pixels 
tend to cancel out not having classified enough pixels correctly. 

The values for show tliat the correctly classified pixels are better 
estimators of the ground truth inventory distribution than the classification 
inventory. Hence the error-cancelling effect of the correctly and incorrectly 
classifiea pix-.ls is not all that good. The values for x| also show that the 
distribution jf correctly and incorrectly classified pixels is nowiiere near 
optimum, nut xf shows that they are closer to being optimally distributed than 
randomly di -tributed. The values for x| also show tliat the correctly classified 
pixels are closer to being optimally distributed than randomly distributed. 

The values for xi show that each feature is not randomly classified, 
although the categories urban and transportation ai'e highly suspect. In all 
cases, the agriculture category is the least randomly classified even though it 
is not the most accurately classified ox* largest category. The values for x| 
show that the ground truth categoi’y transportation is highly suspect of I'andomly 
occurring in places classified as other categories. Tliis test was used primarily 
to determine if the number of misclassified pixels for a particular categoxry 
were distributed or proportional to the population of the other ground truth 
categories. The values of x| indicate that the number of correctly and incor- 
rectly classified pixels for each category are not randomly distributed, but 
again the urban and transportation categories are suspect. 


13 


The values for Xs and %c show that the contingency table distribution 
does not indicate independence of the ground truth and classification results, but 
a 45 percent "correlation*' is nothing to be proud of either. Hence, it appears 
that the classification performance was rather dismal for this test site. Although 
Table 8 indicates that hiLMCM had the best performance, the chi-squared value 
is still too large when measured against the best possilsle performance of all the 
techniques. 

There may be several reasons why the performance of the techniques is 
lower than expected. The first is that the best season may have not been chosen 
for those techniques that used only one set of data. Secondly, the test site is 
rather small (3338G pixels) as test sites go. The observation was made that 
the majority of classification errors occurred at the lx)undary of two or more 
different features and that the homogeneous areas were classified consistently 
accurately. Hence, if tlie test site had been expanded, it is expected that the 
misclassification woidd increase linearly and correct classification would 
5£ crease proportionally to the area. Also better choices of training areas 
v.ould probably be available. Exi^anding tlm site size would also provide a means 
of checldng the stability of the statistics calculated fcr the 3338G pixel test site. 

The discussion of these evaluation criteria and results provides a means 
of establishing a statistical base for determining the perfoimance of various 
classification techniques on different types of data sets and for various remote 
sensing discipline applications. However, tliese criteria provide relatively little 
information concerning the goodness of a CM. The tabular results pro^dde 
information only on how many of each category, whereas a map also provides 
this information as well as where this information is located. The next section 
addresses modification of the contingency table to provide information on tlie 
spatial complexity of the test site, on where misclassification errors occur, and 
on how well the CM agree with the GTM. 


IV. PROPOSED STATISTICAL TESTS FOR EVALUATING 

CM AND/OR GTM 


Although the previous tests contain relatively little information on the 
goodness of maps produced by vaadous classification teclmiques, the tests can 
be adapted to provide some measure of map goodness. One possible clue as to 
what approach should be taicen to adapt these tests is that CM with identical 
inventory and classification accuracies can appear quite different visually. 


14 


Thus, for two such CM, the best choice would appear to be to select the map 
whose homogeneous areas and boundaries coincide best with the GTM homogeneous 
areas and boundaries, A pi’oposed quantitative approacli to making this selection 
is to produce an 8 by 8 contingency matrix to replace each individual element in 
the contingency table of GTM versus CM. The model used to provide numbers 
for the contingency matrix in the contingency table is as follows. 


Let x.. be the reference sample on tlie CM and y.. Idc the reference 

ij 

sample on the GTM at scan i and column i. Let x and x be two 

1-1.] i.j-1 

test samples adjacent to the reference sample on the CM at scans and columns 
i-l,j and i,j-l, respectively, and let y._ . and y. . be the corresponding 

test samples on the GTM. ^ 


Several comparisons can be made between the reference and test samples 

on either the GTM or CM and between corresponding samples on the GTM and 

CM. For example, a vertical (hox’izontal) boundary would be indicated on the 

CM if pixel x. . belongs to a different class than pixel x. (x ). The 
i.J i,]-l 1-1, 

same is true for tlie GTM if x is replaced by y . A homogeneous pixel area 

occurs when x. x. and x. . belong to the same class on the CM. 

The same is also true for the GTM if x is replaced by y . A double boundary 
occurs when the reference sample disagrees with both test samples on either 
the CM or GTM. Comparisons also have to be made between tlie GTM and CM 
to determine how many agreements there are concerning the three corresponding 
pixels. In constructing the 8 by 8 matrix, the upper half will contain entries 
when the reference samples belong to the same class on tlue GTM and CM, the 
lower half will contain entries when the reference samples disagree on the GTM 
and CM, the left half of the matrix will contain entries when either or both of 
the test samples agree on the GTM and CM, and the right half of the matrix will 
contain entries when either or both of die test samples disagree on die GTM and 
CM. A pictoi'ial description of die 8 by 8 contingency matrix is shown in the 
Figure and an explanation of the colmnn and row labeling, as well as the entry 
values follow's. 


The row or GTM label definitions are: 

1.1— The reference samples agree on GTM and CM. There is no feature change 
in either the scan or column direction (liomogeneous pixel area). 

1, 2 — The reference samples agree on GTM and CM. There is a feature change 
in the scan direction only (vertical boundary) . 


15 


GTM/CM 

1.1 

1.2 

1.3 

1.4 

2.1 

2 

2.3 

2.4 

1.1 

3 

2 

2 

1 

0 

1 

1 

2 

1.2 

2 

3,2 

1 

2,1 

1 

0,1 

2 

1,2 

1.3 

2 

1 

3,2 

2,1 

1 

2 

0,1 

1,2 

1.4 

1 

2,1 

2.1 

3,2,1 

2 

1.2 

1,2 

0,1,2 

2.1 

0 

1,0 

1,0 

2,1,0 

3 

2,3 

2,3 

1,2,3 

2.2 

1.0 

1,0 

2,1,0 

2,1,0 

2,3 

2,3 

1,2,3 

1,2,3 

2.3 

1,0 

2.1,0 

1,0 

2,1,0 

2,3 

1.2,3 

2,3 

1.2,3 

2.^ 

2,1,0 

2,1,0 

2,1,0 

2,1,0 

1,2.3 

1,2,3 

1,2,3 

1,2,3 


Reference 

Samples 

Agree 


Reference 

Samples 

Disagree 


Test Samples 
Agi'ee 


Test Samples 
Disagree 


Figure. 8 by 8 contingency matrix. 


1. 3 — Tbe reference samples agree on the GTM and CM. There is a feature 

change in the column direction only (horizontal boundary) . 

1. 4 — The reference samples agree on the GTM and CM. There is a feature 
change in the scan and column directions (double bomidary) . 

The row definitions of 2. 1, 2. 2, 2. 3, and 2. 4 are identical to the above, 
except that die reference samples on the GTM and CM disagree. The colmnn or 
CM labels 1. 1, 1, 2, 1. 3, 1. 4 are identical to those pre\4ously defined, but 2. 1, 

2. 2, 2. 3, 2. 4 refer to test sample disagreements. The entry values in the 
matrix range from zero to three. The left half of the matrix contains the munber 
of agreements on the GTM and CM conceding die three pixel locations, and tlie 
right half contains the nmnber of disagreements. For example, every time a 

1. 1 condition is encountered on die GTM and Ch'I, all three pixels are in agree- 
ment and a three is added to the simi (which is initially zero for all elements) 
contained in matrix element 1. 1, 1. 1. Notice that 1. 1, 2. 1 and 2. 1, 1. 1 are 
impossible situations and always contain zero. In the case where 1. 4, 1. 4 is 
encomitered, there can be 3, 2, or 1 agreements which are added to the svmi In 
matrix element 1.4, 1. 4 and there can be 0, 1, or 2 chs agreements, respectively, 
which are added to the smii of matrix element 1. 4, 2. 4. 


16 



To construct the contingency table using the 8 by 8 contingency matrix, it 
is necessary to use only half of tlie 8 by 8 matrix for each table element. Thus, 
for the diagonal elements of the table, only tlie upper half of the matrix is used 
because tlie bottom half will be all zeros. For the off-diagonal elements of the 
table, only tlie lower half of the matrix is used because the top half will be eiU 
zeros. Hence, each element in the contingency table is replaced by a 4 by 8 
contingency matrix. Notice that tlie original values of the single element con- 
tingency table can be obtained by adding the right half of the 4 by 8 matrix to the 
left half for each table entry, computing the sum of all of the elements of the 
resulting 4 by 4 matrix, and dividing by three. Therefore, the contingency matrix 
not only contains the same information as the contingency table, but it also con- 
tains a considerable amount of information related to tlie structure of the CM. 

There are several types of map structure information that can be 
obtained from the 4 by 8 contingency matrices. By adding the right half of the 
4 by 8 matrix to the left half and dividing all of the elements of the resulting 
4 by 4 matrix by tliree, the 4 by 4 matrix will contain the number of homogeneous 
pixels, vertical boundaries, horizontal boundaries, and double boundaries on the 
diagonal elements for correctly classified pixels- The off-diagonal elements of 
the matrix contain the number of errors whei’e feature changes occurred on tlie 
CM, but did not occur on the GTM or vice versa. Preidous work done on 
identifying the major source of classification errors has indicated that the 
majority of misclassification occurs at a boundary between two or more 
different features. The matrix will help narrow down what type of boundaries 
produce the most errors. By not adding tlie right half to the left half of the 4 by 
8 n.'atrix, it is possible to determine for tliose elements having only two possible 
values, the number of events having each value. This is not possible with the 
matrix elements diat can have three values. 

By comparing a GTM ivitli itself, the contingencj'^ table will contain only 
diagonal elements and these diagonal elements will contain 4 by 4 matrices 
(which are the upper left quarter of the original 8 by 8 matrices) that are them- 
selves diagonal. These 4 by 4 diagonal matrices pro\ide a means of determining 
the spatial complexity of each feature in terms of the niunber of observed homo- 
geneous pixels and various tyjjes of pixel boundaries. By adding all of the 4 by 4 
diagonal matrices, a general measure of spatial complexity can be obtained for 
the entire GTM independent of feature. These measures are tlie expected dis- 
tributions that can be used in various tests for comparing tlie CM (observed 
distributions) v/ith the GTM to determine how well the spatial complexities 
agree. Thus, the comparing of spatial complexities provide a means of selecting 
the best CM from several maps that nave similar inventory and classification 
accuracies. Comparisons can also be made between the various CM as well as 
comparing a CM with itself, if that type of information is desired. 


17 


Table 10 shows the contingency matrix for comparing the GTM \vith 
itself. For the urban category, which contains 276 pixels, there were 66 urban 
pixels (23. 91 percent of tlie urban pixels) that had an urban pixel directly above 
it (previous scan, same column) and an urban pixel directly to the left of it 
(same scan, previous column) . There were also 29 urban pixels (lO. 5 percent) 
that had an urban pixel directly above it and no urban pixel directly to the left 
(vertical boundary) , 23 urban pixels (8, 33 percent) tiaat had an ui'ban pixel 
directly to tlie left of it and no urban pixel directly above (horizontal boundary) , 
and 158 urban pixels (57. 24 percent) tliat had no urban pixels directly above or 
to the left (double boundary) . In describing the features on the GTM, it could 
be said that the i;*ban feature is 23. 9 percent homogeneous, transportation is 
4 percent hoir ^eneous, agriculture is 77. 5 percent homogeneous, forest is 
73. 1 percp' nomogeneous, and water is 18. 3 percent homogeneous. There is 
a corref . jndence between the homogeneity of a feature and the feature classifi- 
cation accuracy in that the more homogeneous features appear to be more 
accurately classified. 

TABLE 10. GTM/GTM CONTINGENCY MATRIX FOR EACH FEATURE 



Table 11 shows the contingency matrix for all of the GTM features 
combined. 


18 
























TABLE 11. GTM/GTM CONTINGENCY ?«IATRIX FOR ALL FEATURES 



The table indicates that the entire map is 71. 1 percen>, liomogeneous, 
which corresponds very closely with the classification accuracies presented in 
Tables 2 through G. Thus, it appears that the homogeneity percentage for the 
GTM could be used as a good estimate of expected minimum classification 
accuracy. Table 11 also indicates that it may be worthwhile to consider using 
spatial information in tlie classifier because 91 pei'cent of tlie pixels belong to 
the same feature as the previous pixel in the sajne scan or same column. 

Table 12 shows the contingency matrix of MLCM/GTM foi' each feature. 
The diagonal and row percentages for correct classification are obtained by 
ratioing the diagonal elements and row sums of tlie diagomil matrices in Table 
12 with the elements in Table 10. For forest, tlie diagonal percentages show 
that for 64. 2 percent of the time, the reference pixel was corrected classified 
when the previous pixel in the same scan and same cohuiin were also correctly 
classified as forest. For the case where the reference pixel and the previous 
pixel in the same column were correctly classified as forest, but the previous 
pixel was categorized as lielonging to another feature, the success was only 
IG. 8 percent. In the case of a horizontal boundary for forest, the success in 
correct classification was only 13. 5 percent, and for the case of a double 
boundary for forest tlie success was only 11.4 jjercent. This situation seems 
to be tyiiical for large hoiiiogeneous areas, indicating tliat the interior pixels 
tend to be more correctly classified than the transition or boundary pixels 
bervveen two or more features. If the constraint is removed that the previous 
pixels in the same scan and same coliunn on the CM iiave to agree with the class 
configuration of the corresponding pixels on the GTM , then tlie row percentages 
show that for forest and \^?hcn there is no feature change in tlie prevdous pixels 
on the GTM, the reference pixel on the CM is correctly classified 8G. 2 percent 
of the time. 

The situation appears to be different for highly linear features such as 
transportation/communication (t) . In this case, the diagonal and row percent- 
ages are higher when a feature change is present in the previous pixels. This 
is probably due to high data contrast between roadways and power line right of 
ways versus forested areas. 


IS 










TABLE 12. MLCM/GTM CONTINGENCY MATRIX FOR EACH FEATURE 



ORIGINAL PAGE IS 
OF POOR QUALITY 


Per cori-eot olossiHcatlons only. 
















































It also appears that the effect of banding can be observed by examining 
the diagonal and row pei'centages change for the 1. 2 and 1. 3 cases. If the class 
configuration is preserved on the CM and GTM ( diagonal percent) , tlie classi- 
fication accuracy is higher for 1.2 (vertical boimdary) . However, if the class 
configuration is ignored on the CM, the classification accuracy is liigher for 
1. 3 (horizontal boundary on GTM) . Both situations are supported by the fact 
that banding is obsexwed as a liorizontal phenomenon produced by data changes 
in the vertical direction. 

Table 13 is a summary of the infoimation in Table 12 for all features. 
The diagonal and row percentages were obtained by ratioing the diagonal 
elements and rov/ sums of Table 13 mth the elements of Table 11. The total 
diagonal percentage was obtained by I'atioing the sum of the diagonal elements 
ir Table 13 with the smn of the diagonal elements in Table 11. The diagonal 
and row percentages indicate essentially the same results as previously 
menti'”,ied. However, it is interesting to compare thi-ee tyjoes of classification 
accuracy based upon different constraints. For MLCM Table 2 shows that if 
the total nimiber of pixels for each featur'e on the CM (regardless of where they 
occur) are compared with the total number of pixels for each feature on the 
GTM, then the inventory accuracy is 81. 94 percent. If the constraint is added 
tliat the CM features pixels are correct if they agree witli the GTM feature 
pixels at the same location, then the classification accui'acy is 71. G7 percent. 

If a constraint is added that feature changes on CM and GTM have to agree 
together witli the correctly classified pixels, then a measure of the map 
accuracy is 45. 77 percent as indicated by the total diagonal percentage in 
Table 13. 


TABLE 13. MLCM/GTM CONTINGENCY MATRIX FOR ALL FEATURES 


'\^LCM 

GTM^\^ 

“1 

All Features 

Diagonal 

Percentages 

Row' 

Percentages 

1.1 

1.2 

1.3 

1.4 

1.1 

13938 

2439 

1610 

10G9 

58. 73 

80.30 

1.2 

818 

GGO 

180 

317 

17. 4G 

52.27 

1.3 

779 

227 

380 

278 

13. 81 

GO. 48 

1.4 

4G3 

307 

159 

304 

9.72 

39.43 


Total Diagonal Percentage 

45.77 



21 


APPROVAL 


EVALUATION CRITERIA FOR SOFTWARE CLASSIFICATION 
INVENTORIES, ACCURACIES. AND MAPS 

By Robert R. Jayroe, Jr. 


The information in this report has been reviewed for security classifi- 
cation. Review of any information concerning Department of Defense or Atomic 
Energy Commission programs has been made by the MSEC Security Classifica- 
tion Officer. This report, in its entirety, has been determined to be unclassi- 
fied. 


This document has also been reviewed and approved for technical 
accuracy. 


. T. POWELL 

Director, Data Systems Laboratory 


22 


■fru.S, GOVERNMENT PRINTING OFFICE 1976-740-0*9/19 REGION NO. 4 


